﻿ Course 5 An Evolutionist View Over Conceptualization in Language Dan Cristea This course is based on a talk given in COST A31 project meeting, Budapest, May 2008 Some pictures belong to Luc Steels Context of my talk, the subject matter • The A-31 COST project “Stability and Adaptation of Classification Systems in a Cross-Cultural Perspective” – classification, hierarchies, conceptualisation … • How concepts aroused in humans and how have they been expressed in language? – Why do we speak the way we do? – Why are there so many languages although most of us operate with the same concepts? – Is there a way to prove scientifically hypotheses about the evolution of languages? The “Talking Heads” experiment • Goal: how the language evolved? – In a community, over time: language emerges through self organisation – In individuals, meaning is build in a cumulative growth process • The ALEAR project (2008-2010) – Coordinator: Luc Steels – SONY Laboratories, Paris ALEAR – Main Objective “Carefully controlled experiments in which autonomous humanoid robots self- organise rich conceptual frameworks and communication systems with similar features as those found in human languages ” Synthesis of intelligence approaches • knowledge-based or symbolic: operationalizing models from logic, generative linguistics and cognitive psychology • machine learning: copy intelligence by learning • behaviour-based: put the study of AI on biological grounds (see also earlier cybernetics) è neural: copy the physical realization of the brain, make variations and study the effects Approaches in studying the evolution of language • Whole systems approach: physical embodiment, sensory-motor, perception, conceptualisation, language • Self-generated: as opposed to designed or acquired through inductive machine learning • Multi-agent: as opposed to stand-alone • Evolutionary: start from scratch and see how a communication system forms and further develops Setting • Cognitive agents: – Physical aspects: • body • sensors • articulators • physical location • objects and agents located in the environment – Mental properties • behaviour • memory • lexicon • grammar • etc • The two aspects are separated: a real agent exists only when a virtual agent is loaded in a physical robot body The robots • Physically embodied autonomous agents – Motor-sensory processing • perception, • movements, • actions – Conceptual processing • recognise objects, • learn a lexicon, • build representations of concepts • towards the development of grammar Teleporting • Develop categories in one location and enrich his learning experience by moving to another location • The transmission of language from one generation to the next • Intercultural exchange and language contacts by migrating mental bodies in different parts of the world The “guessing game” • Two physically instantiated agents: speaker and hearer • Why game? – Because neither agent can look into the mind of the other They only interact through the external environment • What triggers the game behaviour? – There is an innate motivation programmed: agents try to maximise their communicative success this comes from the necessity to survive Interacon games • Rules: – agents can interact only conforming to smuli coming from the external environment – they are bound to maximise their communicave success (programmed) • Acquisions – guessing games: develop the vocabulary and abstract concepts by saying/guessing/poinng – develop space and me conceptualisaons: events – develop rudiments of syntax Sussex University, NLP seminar, 17 March 2011 The protocol of a game - One agent (speaker) chooses an object (focus), conceptualises it and emits a string descripon conforming to his lexicon - The second agent (hearer) parses that descripon, matches it against his own conceptualisaons and returns one object he believes is the focus object - If match, the game is successful è both agents increase their conﬁdence in the mapping conceptual space-lexicon for the words they used - If the game fails è they decrease their conﬁdence and the hearer learns another connecon between the real focus and the descripon string Sussex University, NLP seminar, 17 March 2011 COST A31 – WG1, Budapest, May 2008 The Mondriaan Experiment Perception and categorisation • Sensory channels – software processes interpreting specific real world information: • HPOS (horizontal position) • VPOS (vertical position) • GRAY (gray level) • others – domain: 0-10 (continuous) discretised to a number of discrete values è categorisation – categorisation could be specific to individuals The categories trees • Each individual can develop his own tree of categories for each sensory channel • Example for HPOS: agent A agent B left extreme left left middle left right right middle right extreme right Categorisation in individuals Important notice: the symbols left and This object middle left are our conventions to notate the HPOS property values – they do not belong to the acquired will be interpreted as… lexicon! left middle left by agent A by agent B Sensori-motor Conceptual Form About perception again: why do we use some features and not others? • Salience: the property of one feature to 1 distinguish the topic in the context: 2 3– the minimum distance between the topic’s HPOS value for that feature VPOS and all the other HIGHTobjects’ values for that WIDTH feature GRAY AREA About perception again: why do we use some features and not others? 1 2 3 obj HPOS VPOS HEIGHT WIDTH GRAY AREA 1 0 25 0 45 0 30 0 66 0 45 0 70 After scaling: 2 0 20 0 32 0 40 0 50 0 90 0 74 3 0 42 0 31 0 50 0 30 0 42 0 76 sal 0 05 0 01 0 10 0 16 0 45 0 02 Lexicalisation – associating meanings to words Game 125 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as HPOS[right] B stores “mo” as HPOS[right] Lexicalisation – interpretation Game 205 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B interprets “mo” as HPOS[right] B points to the topic A says “OK” Lexicalisation – synonymy Game 245 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as HPOS[right] B has a word for HPOS[right]: “mogash” B stores “mo” as a synonym for “mogash” Differences in conceptualization produce “subtle” social polysemy Game 280 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as • A has a two-values HPOS[middle right] conceptualization of HPOS B stores “mo” as HPOS[middle • B has a four-values right] conceptualization of HPOS Subtle social differences in meaning can give rise to generalisations Game 302 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B knows “mo” as HPOS[middle right] B does not recognize an object in the scene having this value B says: “mo?” A points to the topic B categorizes the topic as HPOS[extreme right] B stores “mo” as HPOS[extreme right] Now B knows “mo” as both HPOS[middle right] and HPOS[extreme right] By repetition he can infer a new category which subsumes both HPOS[middle right] and HPOS[extreme right], which should be HPOS[right], and this will be called “mo” Ambiguity-1 Game 325 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as HPOS[right] and VPOS[low] and GRAY[light] B stores “mo” as HPOS[right] OR VPOS[low] OR GRAY[light] However, by positive feedback the lexicon will converge towards an efficient usage t is not known a priory whether “mo” will Recovering Ibe stabilized by B as only POS[right] or only VPOS[low] or the from ambiguity Hunion of the two An example: Game 340 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B knows “mo” as HPOS[right] OR VPOS[low] OR GRAY[light] B recognizes an object in the scene with the value HPOS[right] and no object with the value VPOS[low] or GRAY[light] B points to the topic A says “OK” B diminishes the meaning of “mo” as VPOS[low] and GRAY[light] and augments its meaning as HPOS[right] Suppose Game 325 takes place as ollowing, instead: Ambiguity-2 f Game 325’ A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B does not know “mo” B says: “mo?” A points to the topic B categorizes the topic as VPOS[low] B stores “mo” as VPOS[low]] At this moment A and B understand different concepts by “mo” ter Game 325’ A knows “mo” as Ambiguity Afmeaning HPOS[right] and B acquired t as VPOS[low] maintained i Then we have this: Game 390 A segments the scene in 2 objects A categorizes the topic as HPOS[right] A says: “mo” B knows “mo” as VPOS[low] B recognizes an object in the scene with the value VPOS[low] B points to the topic A says “OK” Now and again, the two agents do not realize that they give different meanings to “mo” What is proved? • A lexicon in a single agent – new words are invented or adopted – scores of association between forms (words) and meanings (concepts) go up and down – a “virgin”, “newly born” agent will catch up with a lexicon already existent in a population • A common lexicon that stabilizes in a group of agents – however, differences could still coexist – the lexicon is sensible to the grows or reduction of population – the lexicon can absorb some shocks of contacts with other groups or can be destabilized What the plots show? • Ontology size as compared to communication success • Communication success dips each time the ontology is enlarged with new concepts, since new words have to be invented to deal with them • However, the agents clearly manage to become again successful in guessing communication success ontology What the plots show? • Increasing the population size • The agents create a word without knowing that one word already exist somewhere in the population (as it takes time to propagate) è the risk of synonymy increases • However, a steady progress towards an effective communication system is noticed What the plots show? • What happens when two populations interact? • There is an initial destabilisation period when the coherence is low as the ambiguity increases • However, the new community catches up and a new common lexicon emerges, abundant in synonyms How to prove the origins of language? • In a simplified form: language = lexicon + grammar – The lexicon gives names for concepts and objects: the guessing games – The grammar expresses relations between concepts and objects: how to put it in terms of interactions between agents? Guessing games implicit assumptions • Experiments are made in a controlled setting that simplifies many aspects: – The world is simplified to the content of the table (scene) – The agents have the attention focused towards the scene – Their “aim” is to identify objects (they are motivated) – The agents dispose of a set of channels which are sufficient to put in evidence identifying properties of the objects populating the scene – The maximum vocabulary sufficient to cover the concepts, as values produced by channels, is strictly limited – The words used by the agents apply to property values and not to objects • This setting is assumed (given, programmed) Establishing settings for exercising the birth of a grammar • Setting 1: – The world: the content of the table (scene) – Attention: focused towards the scene – Motivation: identifying objects – Channels: identify properties of objects (enriched) – Maximum vocabulary: strictly limited – The agents already share a background vocabulary for naming properties of objects (as in phase 1), not directly objects – Only one property is not enough for disambiguation è necessity to use combinations of words to express conjunctions – Implicit supposition: combinations express conjunction and not disjunction Putting words together New channels: COLOR: black, red, blue SHAPE: circle, triangle 2 1Important: the above symbols are our conventions to notate 3 4 the mentioned property values – they do not belong to the acquired lexicon! Putting words Game 1014 together A segments the scene in 4 objects A categorizes the topic as {VPOS[low], SHAPE[circle], COLOR[red]} A has the lexicon: “bagadiru” for VPOS[low] “gugeawa” for SHAPE[circle] “camende” COLOR[red] A correctly identifies on the decision tree that VPOS[low] AND SHAPE[circle] are sufficient to identify the topic 2 A says: “bagadiru gugeawa” 1 B has the lexicon: “bagadiru” for VPOS[low] “camende” for COLOR[red] B says: “gugeawa?” A points to the topic 4 B categorizes the topic as {VPOS[low], 3 SHAPE[circle], COLOR[red]} B correctly discovers on the decision tree that either SHAPE[circle] or COLOR[red], in combination with VPOS[low] are sufficient to identify the topic B stores “gugeawa” as both SHAPE[circle], and COLOR[red], with a confidence = 0 5 Conclusions of a set of experiments of this kind • Will not give rise to a grammar of expressing conjunctions: the implicit assumption was that putting words together restricts the selection (in conformity with most modern languages) • The order of words is not important red circular or circular red One step further in building a grammar • Setting 2: – The world: the content of the table (scene) – Attention: focused towards the scene – Motivation for: identifying spatial relations among objects – Channels: identify properties of objects BUT ALSO spatial relations among objects – Maximum vocabulary: strictly limited – The agents have as background a common vocabulary for naming properties of objects (as in phase 1), not directly objects – Implicit supposition: in a linear expression Obj R Obj, 12 the focus is Obj 1 New channels expressing spatial relations • HREL: left-of, right-of 2 • VREL: above, below 1 obj left-of obj(0 9) 12 34 obj left-of obj(0 4) 13 obj left-of obj(0 6) 14 obj left-of obj(0 1) 32 obj left-of obj(0 9) 34 … But grammar is all about form “HREL[right-of(obj)]” expresses the 2 concept “whatever is to the right of obj“ 2 One way to say that in this relation is obj: 1 “obj(right-of) obj” 1 lexical-item-for2 But, the agent knows a term for “right”: “mo” Then, he might combine this term with a new word expressing the concept of “relation”, for instance “ga”: “mo-ga” Expressing relations between objects Initial lexicon: “mo” = HPOS[right] 2 “bagadiru” = VPOS[low] 4 “gugeawa” = SHAPE[circle] 5 “zamira” = SHAPE[triangle] “camende” = COLOR[red] 1 3 “gamaru” = COLOR[gray] Derived expressions: “zamira mo-ga gugeawa” Expressing relations between objects Initial lexicon: “mo” = HPOS[right] “bagadiru” = VPOS[low] 24 “gugeawa” = SHAPE[circle] 5 “zamira” = SHAPE[triangle] “camende” = COLOR[red] “gamaru” = COLOR[gray] 1 3 Derived expressions: “camende gugeawa mo-ga bagadiru gamaru gugeawa” COST A31 – WG1, Budapest, May 2008 Guessing ame 2020 relations G A segments the scene in 5 objects A categorizes the topic as {VPOS[low], HPOS[right], SHAPE[circle], COLOR[red],HREL[right- of(SHAPE[circle])], HREL[right-of(COLOR[gray])], …} Both A and B have the lexicon: “mo” = HPOS[right] 2 “bagadiru” = VPOS[low] 4 “gugeawa” = SHAPE[circle] “zamira” = SHAPE[triangle] “camende” = COLOR[red] 5 “gamaru” = COLOR[gray] A correctly identifies on the decision tree that HREL[right-of(obj)] is sufficient to identify the 1 topic A identifies obj as {SHAPE[circle], VPOS[low], 3 1 3 COLOR[red]} A says: “camende gugeawa mo-ga bagadiru gamaru gugeawa” B says: “mo-ga?” A points to the topic B identifies “camende gugeawa” as either obj or obj 23 but, based on A’s pointing, eliminates obj 2 B identifies “bagadiru gamaru gugeawa” as obj 1 B stores “mo-ga” as HREL[right-of()] How to diminish the amount of initial assumptions? • Remember our Setting 2: – … – Implicit supposition: in a linear expression Obj R Obj, the focus is Obj 121 • This is a direct and artificial immixture in the very heart of the birth process of the language! • Solution: parameterize all grammatical features of the language and let them evolve naturally • Word order and prepositions John gave Mary a book in the library • Affixes Das Mädchen gibt den schweren Koffer The Girl gives the heavy suitcase ihres Bruders den Freunden (of) her brother (to) the friend • Inflection Brutus Marcello librum dedit Brutus Marcellus book gave Brutus gave a book to Marcellus [Source: palmer, p 8] • Particles Tanaka-san wa Tokyo de o-to-san ni atta Tanaka TOPIC Tokyo (in) father (loc) meet Tanaka met his father in Tokyo from Luc Steels Examples of semantic roles: Agent: the instigator of the event Counter-agent: the force or resistance against which the action is carried out Object: the entity that moves or changes or whose position or existence is in consideration Result: the entity that comes into existence as a result of the action Instrument: the stimulus or immediate physical cause of the event Source: the place from which something moves Goal: the place to which something moves Experiencer: the entity which receives or accepts or experiences or undergoes the effect of an action Source: Fillmore from Luc Steels Fluid Construction Grammar • Structures to represent the information needed in language processing about a specific sentence (feature structures) • Structures to represent the lexical and grammatical constructions (rules) • Operations of Unify and Merge • Structures specifying how new rules are built (templates) a chemical metaphor from Luc Steels Sensori-motor Conceptual Grammatical Form Constructions • Semantic and syntactic categories do not operate in isolation • They are part of frames (schemas, patterns) • Constructions are mappings from semantic to syntactic schema • Constructions can also add additional meaning to meaning of the parts and add additional form Fillmore, Kay, Michaelis, Croft, Goldberg, … from Luc Steels Conclusion • The language is considered from an evolutionistic point of view • Lexicon formation: proved experimentally • Grammar and dialogue: was in the study in ALEAR – Perhaps Fluid Construction Grammar • integrates syntax and semantics (Fillmore’s roles) • unification and merge mechanisms Thank you… Research funded by the EC FP7 ALEAR project (2008-2011) 